Skip to content

Uniform clip sampling + multi-clip sampling function#57

Open
themattinthehatt wants to merge 13 commits into
mainfrom
clipsampling
Open

Uniform clip sampling + multi-clip sampling function#57
themattinthehatt wants to merge 13 commits into
mainfrom
clipsampling

Conversation

@themattinthehatt

Copy link
Copy Markdown
Collaborator

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
Current repo only has a single clip selection strategy (manual).

What does this PR do?
Adds a uniform clip sampling strategy option, and updates the CLI to take sampling strategy as input.

References

Builds on #39.

How has this PR been tested?

Added new unit tests, checked all tests are passing locally before making PR.

Is this a breaking change?

Yes; the previous clip extraction entry point was extract-clip; this has been updated to extract-clips to work with multiple clips; in addition, the command line args changed to allow for multiple clip sampling strategies.

Does this PR require an update to the documentation?

Minor update renaming the clip extraction entrypoint from extract-clip to extract-clips.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@themattinthehatt themattinthehatt requested a review from sfmig May 29, 2026 19:30

@sfmig sfmig left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @themattinthehatt, thanks for this!

I made a few suggestions: two of them are two options to try to keep the kwargs within static type checking; the other one is a decoupling suggestion that I think can help simplify tests and has other benefits too.

I think the decoupled approach would be my preferred one in terms of clarity, but it may be a personal preference too. I leave you all the options, let me know what you think. I requested changes in case you want to go for any one of them, I'd be happy to have another look. I haven't checked the tests in detail for the same reason.

Comment thread poseinterface/clips.py Outdated
Comment thread poseinterface/clips.py Outdated
Comment on lines +301 to +311
if args.sampling == "uniform":
if args.num_clips is None:
raise SystemExit(
"error: --num_clips is required when --sampling uniform"
)
kwargs["num_clips"] = args.num_clips
elif args.sampling == "manual":
if not args.start_frames:
raise SystemExit(
"error: --start_frames is required when --sampling manual"
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks are enforced twice: once in main and once in extract_clips. It makes sense to have a CLI-friendly error (i.e. no traceback, to report wrong inputs) and an API-friendly one (with traceback) but we can reduce code duplication a bit more with the suggestion below, that "translates" the API error to a CLI one.

(It assumes the first suggestion for the extract_clips signature above was applied)

    def main(args: argparse.Namespace) -> None:
	    try:
	        extract_clips(
	            args.video_path,
	            args.duration,
	            args.sampling,
	            num_clips=args.num_clips,
	            start_frames=args.start_frames,
	        )
	    except ValueError as e:
	        raise SystemExit(f"error: {e}")

Or if we go for the dataclass approach, the purpose of main shifts to instead catch None input cases (rather than simply translate the error):

def main(args):
    if args.sampling == "uniform":
        if args.num_clips is None:
            raise SystemExit("error: --num_clips is required when --sampling uniform")
        sampling = Uniform(num_clips=args.num_clips)
		# or for the decoupled approach we can directly have:
		# extract_clips_uniform(args.video_path, args.duration, args.num_clips)
    elif args.sampling == "manual":
        if not args.start_frames:
            raise SystemExit("error: --start_frames is required when --sampling manual")
        sampling = Manual(frames=args.start_frames)
		# or for the decoupled approach we can directly have:
		# extract_clips(args.video_path, args.duration, args.start_frames)

	# the following call would not be needed in the decoupled approach
    extract_clips(args.video_path, args.duration, sampling)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the first option better; the dataclass approach feels overly complicated to me (at least at this point)

Comment thread poseinterface/clips.py Outdated
Comment on lines +125 to +129
def extract_clips(
video_path: Path,
duration: int,
sampling: str,
**kwargs: Any,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the function signature would be explicit about the different keyword arguments, rather than managing this via the docstring. That way we don't lose the static type checking on those inputs (i.e. right now mypy won't flag num_clips if it is not an int).

One option to do this would be to define all possible inputs and set them to None by default, then branching based on that:

def extract_clips(
    video_path: Path,
    duration: int,
    sampling: Literal["manual", "uniform"],
    *,
    num_clips: int | None = None,
    start_frames: list[int] | None = None,
) -> list[tuple[Path, Path | None]]:
    if sampling == "manual":
        if start_frames is None:
            raise ValueError("sampling='manual' requires start_frames")
        frames = list(start_frames)
    elif sampling == "uniform":
        if num_clips is None:
            raise ValueError("sampling='uniform' requires num_clips")
        n_frames = sio.load_video(Path(video_path)).shape[0]
        frames = _uniform_start_frames(num_clips, duration, n_frames)
    else:
        raise ValueError(f"Unknown sampling strategy {sampling!r}")

    return [extract_clip(video_path, sf, duration) for sf in frames]

Claude also suggested an alternative approach using dataclasses that may be more convenient if we expect the number of sampling strategies/input parameters to grow. Something like the snippet below:

from dataclasses import dataclass

@dataclass
class Uniform:
    num_clips: int
    def start_frames(self, duration: int, n_frames: int) -> list[int]:
        return _uniform_start_frames(self.num_clips, duration, n_frames)

@dataclass
class Manual:
    frames: list[int]
    def start_frames(self, duration: int, n_frames: int) -> list[int]:
        return list(self.frames)

Sampling = Uniform | Manual

def extract_clips(
    video_path: Path, duration: int, sampling: Sampling
) -> list[tuple[Path, Path | None]]:
    n_frames = sio.load_video(Path(video_path)).shape[0]
    frames = sampling.start_frames(duration, n_frames)
    return [extract_clip(video_path, sf, duration) for sf in frames]

To call it, you would do extract_clips(p, 100, Uniform(num_clips=5)). This one feels a bit more complex but has two nice benefits:

  • you cannot make a Uniform instance without num_clips, so we can do away with the "API" validation / error handling
  • if we add more sampling strategies (which I think we will), we don't need to touch extract_clips

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about being explicit with keyword args for improved type hints/checking, good idea 👌

Comment thread poseinterface/clips.py
return [extract_clip(video_path, sf, duration) for sf in start_frames]


def _uniform_start_frames(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach in extract_clips bundles together input preparation (i.e. computing start_frames based on the sampling strategy) and the action (i.e. the extraction of the frames). The same applies for the suggestions I gave. This makes a more convenient function call, but the coupling of these two objectives has some downsides.

We can decouple it (initially this is what I had in mind). The extract_clips function can take start_frames as input, and then we have several functions that generate these start frames (e.g., the current _uniform_start_frames). For convenience, we can have wrappers that we can wire to the CLI. These wrappers combine extract_clips with a start-frame-production function:

# "core" function, only extracts
def extract_clips(
    video_path: Path,
    duration: int,
    start_frames: list[int],
) -> list[tuple[Path, Path | None]]:
	# potentially some validation here
    return [extract_clip(video_path, sf, duration) for sf in start_frames]

# wrapper, prepares the input. We link this to the CLI
def extract_clips_uniform(video_path, duration, num_clips):
    n_frames = sio.load_video(video_path).shape[0]
    frames = _uniform_start_frames(num_clips, duration, n_frames)
    return extract_clips(video_path, duration, frames)

This decoupled approach has a few benefits:

  • simpler and smaller unit tests (because each function does one thing)
  • it scales nicely to other strategies (contributors need to add a "start-frames" function and then write the thin wrapper that collects the call; extract_clips stays the same)
  • the dispatch that translates a string (uniform) into a function (_uniform_start_frames) now occurs at the CLI main (it is removed entirely from here). We would need to keep the None-guards in main as mentioned above tho.

I think it would also help with the validation of inputs:

  • Right now, start_frames values aren't validated as a list of positive integers anywhere (each frame is only checked individually inside the loop that calls extract_clip, which is late to catch the error). This is easy to miss with the current approach I think, because validation is a bit spread across different places. The start frame validation would belong nicely within the decoupled extract_clips, and that way we can report all bad frames at once instead of dying on the first.
  • Claude suggests to use a small validation function, to use in extract_clip and extract_clips. But in the multi-clip approach, we would validate the inputs up-front.
def _validate_clip_request(start_frame: int, duration: int) -> None:
    if start_frame < 0:
        raise ValueError(f"start_frame must be non-negative, got {start_frame}")
    if duration <= 0:
        raise ValueError(f"duration must be positive, got {duration}")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S.: Claude says "argparse can enforce the conditional requirement via subcommands". It would look something like this:

extract-clips uniform --video_path v.mp4 --duration 100 --num_clips 5
extract-clips manual  --video_path v.mp4 --duration 100 --start_frames 0 500 1000

I have never done this, and I think I prefer the argparse to have no "important" logic, but just in case you want to look into it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another great suggestion @sfmig! I also agree that I don't want argparse performing this logic, it's simple in the manual case but we'll likely have more/more complex logic checks for other sampling strategies, so this should be handled in extract_clips_*.

Comment thread poseinterface/clips.py Outdated
"Supported strategies: 'manual', 'uniform'."
)

return [extract_clip(video_path, sf, duration) for sf in start_frames]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to avoid confusion between both functions:

Suggested change
return [extract_clip(video_path, sf, duration) for sf in start_frames]
return [extract_single_clip(video_path, sf, duration) for sf in start_frames]

@themattinthehatt

Copy link
Copy Markdown
Collaborator Author

@sfmig thanks for the great feedback! I incorporated many of you suggestions. I also imagine at some point we'll build a proper CLI module and move wrapper()/main()/parse_args() out of this module. Is that what you had in mind, or do you want to keep all of this functionality in one place?

I have one test failing: https://github.com/neuroinformatics-unit/poseinterface/actions/runs/27362615831/job/80853717159?pr=57. Do you know why this is the case? I didn't touch this module on my side, is this related to some update in movement?

@niksirbi

Copy link
Copy Markdown
Member

I have one test failing: neuroinformatics-unit/poseinterface/actions/runs/27362615831/job/80853717159?pr=57. Do you know why this is the case? I didn't touch this module on my side, is this related to some update in movement?

Ah yeah, that's because of the breaking change of dimension names in movement v0.17.0. To unblock the ongoing PRs, we could temporarily pin movement to v0.16.0 here and then we can follow up with a separate PR bringing the codebase up-to-speed with movement v0.17.0.

@niksirbi

niksirbi commented Jun 11, 2026

Copy link
Copy Markdown
Member

@themattinthehatt, I've quickly fixed the movement issue (it was an easy fix).

If you give #59 a quick approval, we can merge that first and then rebase your 2 PRs to main (that would be the 'cleaner' fix).

@themattinthehatt themattinthehatt requested a review from sfmig June 12, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants